A Theoretical and Empirical Analysis of Reward Transformations in Multi-Objective Stochastic Games
نویسندگان
چکیده
Reward shaping has been proposed as a means to address the credit assignment problem in Multi-Agent Systems (MAS). Two popular shaping methods are Potential-Based Reward Shaping and difference rewards, and both have been shown to improve learning speed and the quality of joint policies learned by agents in single-objective MAS. In this work we discuss the theoretical implications of applying these approaches to multi-objective MAS, and evaluate their efficacy using a new multi-objective benchmark domain where the true set of Pareto optimal system utilities is known.
منابع مشابه
Analysing the Effects of Reward Shaping in Multi-Objective Stochastic Games
The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a mean...
متن کاملMulti-item inventory model with probabilistic demand function under permissible delay in payment and fuzzy-stochastic budget constraint: A signomial geometric programming method
This study proposes a new multi-item inventory model with hybrid cost parameters under a fuzzy-stochastic constraint and permissible delay in payment. The price and marketing expenditure dependent stochastic demand and the demand dependent the unit production cost are considered. Shortages are allowed and partially backordered. The main objective of this paper is to determine selling price, mar...
متن کاملCompetitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve fo...
متن کاملPolicy Invariance under Reward Transformations for General-Sum Stochastic Games
We extend the potential-based shapingmethod fromMarkov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.
متن کاملDesigning a new multi-objective fuzzy stochastic DEA model in a dynamic environment to estimate efficiency of decision making units (Case Study: An Iranian Petroleum Company)
This paper presents a new multi-objective fuzzy stochastic data envelopment analysis model (MOFS-DEA) under mean chance constraints and common weights to estimate the efficiency of decision making units for future financial periods of them. In the initial MOFS-DEA model, the outputs and inputs are characterized by random triangular fuzzy variables with normal distribution, in which ...
متن کامل